50 research outputs found

    Models of Social Groups in Blogosphere Based on Information about Comment Addressees and Sentiments

    Full text link
    This work concerns the analysis of number, sizes and other characteristics of groups identified in the blogosphere using a set of models identifying social relations. These models differ regarding identification of social relations, influenced by methods of classifying the addressee of the comments (they are either the post author or the author of a comment on which this comment is directly addressing) and by a sentiment calculated for comments considering the statistics of words present and connotation. The state of a selected blog portal was analyzed in sequential, partly overlapping time intervals. Groups in each interval were identified using a version of the CPM algorithm, on the basis of them, stable groups, existing for at least a minimal assumed duration of time, were identified.Comment: Gliwa B., Ko\'zlak J., Zygmunt A., Models of Social Groups in Blogosphere Based on Information about Comment Addressees and Sentiments, in the K. Aberer et al. (Eds.): SocInfo 2012, LNCS 7710, pp. 475-488, Best Paper Awar

    Validation of Memory Accesses Through Symbolic Analyses

    Get PDF
    International audienceThe C programming language does not prevent out-of- bounds memory accesses. There exist several techniques to secure C programs; however, these methods tend to slow down these programs substantially, because they populate the binary code with runtime checks. To deal with this prob- lem, we have designed and tested two static analyses - sym- bolic region and range analysis - which we combine to re- move the majority of these guards. In addition to the analy- ses themselves, we bring two other contributions. First, we describe live range splitting strategies that improve the effi- ciency and the precision of our analyses. Secondly, we show how to deal with integer overflows, a phenomenon that can compromise the correctness of static algorithms that validate memory accesses. We validate our claims by incorporating our findings into AddressSanitizer. We generate SPEC CINT 2006 code that is 17% faster and 9% more energy efficient than the code produced originally by this tool. Furthermore, our approach is 50% more effective than Pentagons, a state- of-the-art analysis to sanitize memory accesses

    A rhesus macaque (Macaca mulatta) model of aerosol-exposure brucellosis (Brucella suis): pathology and diagnostic implications

    Get PDF
    The US Centers for Disease Control and Prevention lists Brucella as a potential bioterrorism threat requiring enhanced diagnostic capacity and surveillance (http://emergency.cdc.gov/bioterrorism/). Successful treatment and management of patients after exposure to biological threat agents depends on accurate and timely diagnosis, but many biothreat agents present with similar, vague clinical signs – commonly referred to as ‘flu-like illness’. Diagnosis of brucellosis is notoriously challenging, especially early in infection, and definitive diagnosis may require invasive methods, e.g. bone marrow biopsy. We studied the pathogenesis of Brucella suis aerosol infection in rhesus macaques in an effort to guide the diagnostic algorithm in case of possible intentional exposure of humans. Rhesus proved to be an excellent model for human brucellosis; the data showed that PCR DNA amplification testing of non-invasive diagnostic samples has the potential to definitively detect a point-source outbreak immediately and for several days after exposure

    Modern web technologies

    No full text
    Nowadays, World Wide Web is one of the most significant tools that people employ to seek information, locate new sources of knowledge, communicate, share ideas and experiences or even purchase products and make online bookings. The technologies adopted by the modern Web applications are being discussed in this book chapter. We summarize the most fundamental principles employed by the Web such as the client-server model and the http protocol and then we continue by presenting the current trends such as asynchronous communications, distributed applications, cloud computing and mobile Web applications. Finally, we conduct a short discussion regarding the future of the Web and the technologies that are going to play key roles in the deployment of novel applications. © 2011 Springer-Verlag Berlin Heidelberg

    Improving opinionated blog retrieval effectiveness with quality measures and temporal features

    No full text
    The massive acceptance and usage of the blog communities by a significant portion of the Web users has rendered knowledge extraction from blogs a particularly important research field. One of the most interesting related problems is the issue of the opinionated retrieval, that is, the retrieval of blog entries which contain opinions about a topic. There has been a remarkable amount of work towards the improvement of the effectiveness of the opinion retrieval systems. The primary objective of these systems is to retrieve blog posts which are both relevant to a given query and contain opinions, and generate a ranked list of the retrieved documents according to the relevance and opinion scores. Although a wide variety of effective opinion retrieval methods have been proposed, to the best of our knowledge, none of them takes into consideration the issue of the importance of the retrieved opinions. In this work we introduce a ranking model which combines the existing retrieval strategies with query-independent information to enhance the ranking of the opinionated documents. More specifically, our model accounts for the influence of the blogger who authored an opinion, the reputation of the blog site which published a specific blog post, and the impact of the post itself. Furthermore, we expand the current proximity-based opinion scoring strategies by considering the physical locations of the query and opinion terms within a document. We conduct extensive experiments with the TREC Blogs08 dataset which demonstrate that the application of our methods enhances retrieval precision by a significant margin

    Effective Unsupervised Matching of Product Titles with k-Combinations and Permutations

    No full text
    The problem of matching product titles is of particular interest for both users and marketers. The former, frequently search the Web with the aim of comparing prices and characteristics, or obtaining and aggregating information provided by other users. The latter, often require wide knowledge of competitive policies, prices and features to organize a promotional campaign about a group of products. To address this interesting problem, recent studies have attempted to enrich the product titles by exploiting Web search engines. More specifically, these methods suggest that for each product title a query should be submitted. After the results have been collected, the most important words which appear in the results are identified and appended in the titles. In the sequel, each word is assigned an importance score and finally, a similarity measure is applied to identify if two or more titles refer to the same product. Nonetheless, these methods have multiple problems including scalability, slow retrieval of the required additional search results, and lack of flexibility. In this paper, we present a different approach which addresses all these issues and is based on the morphological analysis of the titles of the products. In particular, our method operates in two phases. In the first phase, we compute the combinations of the words of the titles and we record several statistics such as word proximity and frequency values. In the second phase, we use this information to assign a score to each combination. The highest scoring combination is then declared as label of the cluster which contains each product. The experimental evaluation of the algorithm, in a real world dataset, demonstrated that compared to three popular string similarity metrics, our approach achieves up to 36% better matching performance and at least 13 times faster execution. © 2018 IEEE

    A supervised machine learning classification algorithm for research articles

    No full text
    The issue of the automatic classification of research articles into one or more fields of science is of primary importance for scientific databases and digital libraries. A sophisticated classification strategy renders searching more effective and assists the users in locating similar relevant items. Although the most publishing services require from the authors to categorize their articles themselves, there are still cases where older documents remain unclassified, or the taxonomy changes over time. In this work we attempt to address this interesting problem by introducing a machine learning algorithm which combines several parameters and meta-data of a research article. In particular, our model exploits the training set to correlate keywords, authors, co-authorship, and publishing journals to a number of labels of the taxonomy. In the sequel, it applies this information to classify the rest of the documents. The experiments we have conducted with a large dataset comprised of about 1,5 million articles, demonstrate that in this specific application, our model outperforms the AdaBoost.MH and SVM methods. Copyright 2013 ACM

    Positional data organization and compression in web inverted indexes

    No full text
    To sustain the tremendous workloads they suffer on a daily basis, Web search engines employ highly compressed data structures known as inverted indexes. Previous works demonstrated that organizing the inverted lists of the index in individual blocks of postings leads to significant efficiency improvements. Moreover, the recent literature has shown that the current state-of-the-art compression strategies such as PForDelta and VSEncoding perform well when used to encode the lists docIDs. In this paper we examine their performance when used to compress the positional values. We expose their drawbacks and we introduce PFBC, a simple yet efficient encoding scheme, which encodes the positional data of an inverted list block by using a fixed number of bits. PFBC allows direct access to the required data by avoiding costly look-ups and unnecessary information decoding, achieving several times faster positions decompression than the state-of-the-art approaches. © 2012 Springer-Verlag

    Computing scientometrics in large-scale academic search engines with MapReduce

    No full text
    Apart from the well-established facility of searching for research articles, the modern academic search engines also provide information regarding the scientists themselves. Until recently, this information was limited to include the articles each scientist has authored, accompanied by their corresponding citations. Presently, the most popular scientific databases have enriched this information by including scientometrics, that is, metrics which evaluate the research activity of a scientist. Although the computation of scientometrics is relatively easy when dealing with small data sets, in larger scales the problem becomes more challenging since the involved data is huge and cannot be handled efficiently by a single workstation. In this paper we attempt to address this interesting problem by employing MapReduce, a distributed, fault-tolerant framework used to solve problems in large scales without considering complex network programming details. We demonstrate that by setting the problem in a manner that is compatible to MapReduce, we can achieve an effective and scalable solution. We propose four algorithms which exploit the features of the framework and we compare their efficiency by conducting experiments on a large dataset comprised of roughly 1.8 million scientific documents. © 2012 Springer-Verlag

    An iterative distance-based model for unsupervised weighted rank aggregation

    No full text
    Rank aggregation is a popular problem that combines different ranked lists from various sources (frequently called voters or judges), and generates a single aggregated list with improved ranking of its items. In this context, a portion of the existing methods attempt to address the problem by treating all voters equally. Nevertheless, several related works proved that the careful and effective assignment of different weights to each voter leads to enhanced performance. In this article, we introduce an unsupervised algorithm for learning the weights of the voters for a specific topic or query. The proposed method is based on the fact that if a voter has submitted numerous elements which have been placed in high positions in the aggregated list, then this voter should be treated as an expert, compared to the voters whose suggestions appear in lower places or do not appear at all. The algorithm iteratively computes the distance of each input list with the aggregated list and modifies the weights of the voters until all weights converge. The effectiveness of the proposed method is experimentally demonstrated by aggregating input lists from six TREC conferences. © 2019 Association for Computing Machinery
    corecore